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Full text available: ^ pdf(575.72 KB ) Additional Information: full citation , abstract , references , index terms 

In this paper we present some experimental results on the classification of natural 
language documents using Kohonen's self-organizing-map neural network paradigm. We 
discuss, in particular, how the classification accuracy can be Improved if the standard 
keyword representation of documents is enhanced by including specific weights, 
thesaurally-defined relations among keywords, and additional synonyms for keywords. We 
sketch the main features of a prototype of an automatic document classification ... 
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Latent Semantic Indexing (LSI) is an information retrieval naetliod ttiat organizes 
information into a semantic structure that takes advantage of some of the implicit higher- 
order associations of words with text objects. The resulting structure reflects the major 
associative patterns in the data while ignoring some of the smaller variations that may be 
due to idiosyncrasies in the word usage of individual documents. This permits retrieval 
based on the "latent" semantic content of t ... 
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This paper reports the results of a preliminary experiment on the detection of semantic 
variants of terms in a French technical document. The general goal of our work is to help 
the structuration of terminologies. Two kinds of semantic variants can be found in 
traditional terminologies: strict synonymy links and fuzzier relations like see-also. We 
have designed three rules which exploit general dictionary information to infer synonymy 
relations between complex candidate terms. The results ... 

Use of syntactic context to produce ternn association lists for text retrieval 
Gregory Grefenstette 

June 1992 Proceedings of the 15th annual international ACM SIGIR conference on 
Research and development in information retrieval 
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One aspect of world knowledge essential to information retrieval is knowing when two 
words are related. Knowing word relatedness allows a system given a user's query terms 
to retrieve relevant documents not containing those exact terms. Two words can be said 
to be related if they appear in the same contexts Document co-occurrence gives a 
measure of word relatedness that has proved to be too rough to be useful. The relatively 
recent apparition of on-line dictionaries and robust and rapid par ... 
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Full text available: ^ pdf(467.30 KB) Additional Information: full citation , abstract , references 

This paper presents an overview of current research concerning knowledge extraction 
from technical texts. In particular, the use of empirical techniques during the identification 
and generation of a semantic representation is considered. A key step is the discovery of 
useful n-grams and correlations between clusters of these n-grams. 

Keywords: knowledge representation, language understanding, large text corpora 
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The graph-traversal approach to hypertext information retrieval is a conceptualization of 
hypertext in which the structural aspects of the nodes are emphasized. A user navigates 
through such hypertext systems by evaluating the semantics associated with links 
between nodes as well as the information contained In nodes. [Fris88] In this paper we 
describe an hierarchical structure which effectively supports the graphical traversal of a 
document collection in a hypertext system ... 
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In a multidatabase system, schematic conflicts between two objects are usually of interest 
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only when the objects have some semantic similarity. We use the concept of semantic 
proximity, which is essentially an abstraction/mapping between the domains of the two 
objects associated with the context of comparison. An explicit though partial context 
representation is proposed and the specificity relationship between contexts is defined. 
The contexts are organized as a meet semi-l ... 
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^ James Pitkow, Peter Pirolli 

N/ March 1997 Proceedings of the SIGCHI conference on Human factors in computing 
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The problem of analyzing and classifying conceptual schemas is becomig increasingly 
Important due to the availability of a large number of schemas related to existing 
applications. The purposes of schema analysis and classification activities can be different: 
to extract information on Intensional properties of legacy systems in order to restructure 
or migrate to new architectures; to build libraries of reference conceptual components to 
be used in building new applications in a given domai ... 

Keywords: conceptual modeling, reference components, schema classification, schema 
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21 An interface for naviqatinq clustered document sets returned by queries 
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22 Enriched knowledge representation for information retrieval 
F. N. Teskey 

November 1987 Proceedings of the 10th annual international ACM SIGIR conference 
on Research and development in information retrieval 

Publisher: ACM Press 

Full text available: ^ pdf (630.00 KB) Additional Information: full citation , abstract , references , index terms 

In this paper we identify the need for a new theory of information. An information model 
is developed which distinguishes between data, as directly observable facts, information, 
as structured collections of data, and knowledge as methods of using information. The 
model is intended to support a wide range of information systems. In the paper we 
develop the use of the model for a semantic information retrieval system using the 
concept of semantic categories. The likely benefits of this area ... 

23 Text categorization for multiple users based on semantic features from a machine- 
^ readable dictionar y 

^ Elizabeth D. Liddy, Woojin Paik, Edmund S. Yu 

July 1994 ACM Transactions on Information Systems (TOIS), volume 12 issue 3 

Publisher: ACM Press 
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The text categorization module described here provides a front-end filtering function for 
the larger DR-LINK text retrieval system [Liddy and Myaeing 1993]. The model evaluates 
a large incoming stream of documents to determine which documents are sufficiently 
similar to a profile at the broad subject level to warrant more refined representation and 
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matching. To accomplish this task, each substantive word in a text is first categorized 
using a feature set based on the semantic Subject Field ... 

Keywords: semantic vectors, subject field coding 



24 Document retrieval and text retrieval: An overview of PR-LINK and its approach to 
document filtering 

Elizabeth D. Liddy, Woojin Paik, Edmund S. Yu, Kenneth A. McVearry 

March 1993 Proceedings of the workshop on Human Language Technology HLT '93 

Publisher: Association for Computational Linguistics 

Full text available: ^ pdf(409.85 KB ) Additional Information: full citation , abstract , references 

DR-LINK is an information retrieval system, complex in design and processing, with the 
potential for providing significant advances in retrieval results due to the range and 
richness of semantic representation done by the various modules in the system. By using 
a full continuum of linguistic-conceptual processing, DR-LINK has the capability of 
producing documents which precisely match users' needs. Each of DR-LINK's six 
processing modules add to the conceptual enhancement of the document and que ... 
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^ December 1989 Proceedings of the 13th annual international ACM SIGIR conference 
on Research and development in information retrieval 

Publisher: ACM Press 
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Full text available: TOpdf(1.62 MB) ^ 
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Term clustering and syntactic phrase formation are methods for transforming natural 
language text. Both have had only mixed success as strategies for improving the quality 
of text representations for document retrieval. Since the strengths of these methods are 
complementary, we have explored combining them to produce superior representations. 
In this paper we discuss our implementation of a syntactic phrase generator, as well as 
our preliminary experiments with producing phrase clusters. Th ... 



26 Contextualizin g the information space in federated di g ital libraries 
M. P. Papazoglou, J. Hoppenbrouwers 
March 1999 ACM SZGMOD Record, volume 28 issue i 

Publisher: ACM Press 

Full text available: ^ pdf(695.20 KB) Additional Information: full citation , abstract , citings, index terms 

Rapid growth in the volume of docunnents, their diversity, and terminological variations 
render federated digital libraries increasingly difficult to manage. Suitable abstraction 
mechanisms are required to construct meaningful and scalable document clusters, 
forming a cross-digital library information space for browsing and semantic searching. 
This paper addresses the above issues, proposes a distributed semantic framework that 
achieves a logical partitioning of the information space accordi ... 



27 Adaptive document clustering 
C, T. Yu, Y. T. Wang, C. H. Chen 

June 1985 Proceedings of the 8th annual international ACM SIGZR conference on 
Research and development in information retrieval 
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28 Structured hypertext with domain semantics 
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One important facet of current hypertext research involves using knowledge-based 
techniques to develop and maintain document structures. A semantic net is one such 
technique. However, most semantic-net-based hypertext systems leave the linking 
consistency of the net to individual users. Users without guidance may accidentally 
introduce structural and relational inconsistencies in the semantic nets. The relational 
inconsistency hinders the creation of domain information models. The structura ... 

Keywords: graph theory, hypertext models, hypertext structures 



29 What if there were desktop access to the computer science literature? 

^ Dennis J. Brueni, Baziley T. Cross, Edward A. Fox, Lenwood S. Heath, Deborah Hix, Lucy T. 
^ Nowell, William C. Wake 

March 1993 Proceedings of the 1993 ACM conference on Computer science 

Publisher: ACM Press 
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Full text available: TO pdf(1.18 MB) ^ 
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What if there were an electronic computer science library? Consider the possibilities of 
having your favorite publications available within finger's reach. Consider Project Envision, 
an ongoing effort to build a user-centered database from the computer science literature. 
This paper describes the project's first year progress, stressing the underlying motivation, 
user-centered development, and the overall design. 

30 A self-organizing semantic map for information retrieval 
^ Xia Lin, Dagobert Soergel, Gary Marchionini 

^ September 1991 Proceedings of the 14th annual international ACM SIGIR conference 
on Research and development in information retrieval 
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31 Guessing morphology from terms and corpora 
Christian Jacquemin 
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SIGIR '97, Volume 31 Issue SI 
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In a new method for automatic indexing and retrieval, implicit higher-order structure in 
the association of terms with documents is modeled to improve estimates of term- 
document association, and therefore the detection of relevant documents on the basis of 
terms found in queries. Singular-value decomposition is used to decompose a large term 
by document matrix into 50 to 150 orthogonal factors from which the original matrix can 
be approximated by linear combination; both documents and terms ... 
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^ results using a large category hierarchy 

^ Marti A. Hearst, Chandu Karadi 

July 1997 ACM SIGIR Forum , Proceedings of the 20th annual international ACM 

SIGIR conference on Research and development in information retrieval 
SIGIR '97, volume 31 Issue SI 
Publisher: ACM Press 

Full text available: ^ pdff3.95 MB ) Additional Information: full citation , references , citings , index terms 



Ap proximating matrix multiplication for pattern recognition tasks 
Edith Cohen, David D. Lewis 

January 1997 Proceedings of the eighth annual ACM-SIAM symposium on Discrete 
algorithms 

Publisher: Society for Industrial and Applied Mathematics 

Full text available: ^pdfn.10 MB) Additional Information: full citation , references , citings , index terms 



36 infoCrvstal: a visual tool for information retrieval & mana g ement 
Anseinn Spoerri 

December 1993 Proceedings of the second international conference on Information 
and knowledge management 

Publisher: ACM Press 

Full text available: ^ pdf(999.84 KB) Additional Information: full citation , references , citings , index terms 




37 Hy pIR: a hypertext-based approach to information retrieval Q 
Fazli Can, Yuan-Ming Lee 

March 1993 Proceedings of the 1993 ACM/SIGAPP symposium on Applied computing: 
states of the art and practice 

Publisher: ACM Press 

Full text available: ^ pdf(770.39 KB) Additional Information: full citation , references , index terms 




http://portal.acm.org/resultsxfm?query=%2Bclustering%20documents%2C^^^^ 5/15/06 



Results (page 2): +clustering documents, semantics, similarities 



Page 5 of 6 



Keywords: browsing, hypercard, hypertext, information retrieval, Information retrieval 
techniques, Inverted files 
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June 1993 Proceedings of the 31st annual meeting on Association for Computational 
Linguistics 
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In this paper we present a method to group adjectives according to their meaning, as a 
first step towards the automatic identification of adjectival scales. We discuss the 
properties of adjectival scales and of groups of semantically related adjectives and how 
they imply sources of linguistic knowledge In text corpora. We describe how our system 
exploits this linguistic knowledge to compute a measure of similarity between two 
adjectives, using statistical techniques and without having access to ... 

39 Interactive clustering for navigating in hypermedia systems 
Sougata Mukherjea, James D. Foley, Scott E. Hudson 

September 1994 Proceedings of the 1994 ACM European conference on Hypermedia 
technology 

Publisher: ACM Press 
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This paper talks about clustering related nodes of an overview diagram to reduce its 
complexity and size. This is because although overview diagrams are useful for helping 
the user to navigate in a hypermedia system, for any real-world system these become too 
complicated and large to be really useful. Both structure-based and content-based 
clustering are used. Since the nodes can be related to each other in different ways, 
depending on the situation different clustered views will be useful. ... 

Keywords: clustering, information visualization, navigation, overview diagrams 



40 Fast detection of connmunication patterns in distributed executions B 
Thomas Kunz, Michiel F. H. Seuren 

November 1997 Proceedings of the 1997 conference of the Centre for Advanced 
Studies on Collaborative research 

Publisher: IBM Press 

Full text available: ^pdf (4.21 MB) Additional Information: full citation , abstract , references , index terms 

Understanding distributed applications is a tedious and difficult task. Visualizations based 
on process-time diagrams are often used to obtain a better understanding of the 
execution of the application. The visualization tool we use is Poet, an event tracer 
developed at the University of Waterloo. However, these diagrams are often very complex 
and do not provide the user with the desired overview of the application. In our 
experience, such tools display repeated occurrences of non-trivial commun ... 
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This paper describes a heuristic approach capable of automatically clustering senses in a 
nnachine-readable dictionary (MRD). Including these clusters in the MRD-based lexical 
database offers several positive benefits for word sense disannbiguation (WSD). First, the 
clusters can be used as a coarser sense division, so unnecessarily fine sense distinction 
can be avoided. The clustered entries in the MRD can also be used as materials for 
supervised training to develop a WSD system. Furthermore, if t ... 
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'Probability is expectation founded upon partial knowledge.' (Boole, 1854) Information 
retrieval based on stored program electronic computers has been an active area of 
research since the time these machines were invented. It is therefore sonnewhat 
surprising that even now no formal computational model for IR exists. There is no well- 
defined logic to describe information retrieval, and there is no proof or model theory to 
talk about the truths of IR. This paper argues t ... 
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In information retrieval, cluster analysis is an important tool employed to enhance both 
efficiency and effectiveness of the retrieval process. Most clustering algorithms have 
difficulty in reflecting the closeness of documents as perceived by the user. A two phase 
scheme for document clustering, whose results reflect the "conceptual" clusters that are 
perceived by the user of the retrieval system, is proposed. Since the clusters obtained by 
this scheme are not characterized in ... 

46 Hypertext databases and data minin g B 
^ Soumen Chakrabarti 

June 1999 ACM SIGMOD Record , Proceedings of the 1999 ACM SIGMOD international 

conference on Management of data SIGMOD '99, volume 28 issue 2 
Publisher: ACM Press 
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The volume of unstructured text and hypertext data far exceeds that of structured data. 
Text and hypertext are used for digital libraries, product catalogs, reviews, newsgroups, 
medical reports, customer service reports, and the like. Currently measured in billions of 
dollars, the worldwide internet activity is expected to reach a trillion dollars by 2002. 
Database researchers have kept some cautious distance from this action. The goal of this 
tutorial is to expose database researchers to t ... 
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Publisher: ACM Press 
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This article outlines basic videodisc and optical disk technology. Both optical and 
capacitance videodisc technology are described. Optical disk technology as a mass digital 
image and data storage device is defined and briefly compared with other established 
information storage media including magnetic tape and microforms. The article includes a 
look into the future of videodisc and optical disk. 
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August 1998 Proceedings of the 17th international conference on Computational 



http://portal.acm.org/results.cfm?query=%2Bclusteririg%20documents%2C%20semantics%2C%20simil^ 5/15/06 



Results (page 3): -hclustering documents, semantics, similarities 



Page 3 of 6 



linguistics - Volume 1 , Proceedings of the 36th annual meeting on 
Association for Computational Linguistics - Volume 1 

Publisher: Association for Computational Linguistics , Association for Computational Linguistics 

Full text available: fgl pdf(556.98 KB) 

M ^ Additional Information: full citation , abstract , references 

^ Publisher Site 

Redundancy is a good tiling, at least in a learning process. To be a good teacher you nnust 
say what you are going to say, say it, then say what you have just said. Well, three times 
is better than one. To acquire and learn knowledge from text for building a lexical 
knowledge base, we need to find a source of Information that states facts, and repeats 
them a few times using slightly different sentence structures. A technique is needed for 
gathering Information from that source and identify the red ... 
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The use of inference networks to support document retrieval is introduced. A network- 
based retrieval model is described and compared to conventional probabilistic and Boolean 
models. 
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Query processing in a multimedia document system is described. Multimedia documents 
are information objects containing formatted data, text, image, graphics, and voice. The 
query language is based on a conceptual document model that allows the users to 
formulate queries on both document content and structure. The architecture of the system 
is outlined, with focus on the storage organization in which both optical and magnetic 
devices can coexist. Query processing and the different strategies ... 
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January 1997 Proceedings of the sixth international conference on Information and 
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We analyze the kinematics of probabilistic term weights at retrieval time for different 
Information Retrieval models. We present four models based on different notions of 
probabilistic retrieval. Two of these models are based on classical probability theory and 
can be considered as prototypes of models long in use in Information Retrieval, like the 
Vector Space Model and the Probabilistic Model. The two other models are based on a 
logical technique of evaluating the probability of a conditi ... 

Keywords: logical imaging, probabilistic modeling, probabilistic retrieval 
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The problem of using a broker to select a subset of available information servers in order 
to achieve a good trade-off between document retrieval effectiveness and cost is 
addressed. Server selection methods which are capable of operating in the absence of 
global information, and where servers have no knowledge of brokers, are investigated. A 
novel method using Lightweight Probe queries (LWP method) is compared with several 
methods based on data from past query processing, while Random and ... 

Keywords: Lightweight Probe queries, information servers, network servers, server 
ranking, server selection, text retrieval 
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A general concern about object-oriented systems has been whether or not they are able 
to meet the performance demands required to be useful for the development of significant 
production software systems. Attempts to evaluate this assertion have been hampered by 
a lack of meaningful performance benchmarks that compare database operations across 
different kinds of databases. In this paper, we utilize the Sun Benchmark [Rube87] as a 
means for assessing the performance of an object d ... 
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This paper is in two parts, following the suggestion that I first comment on my own past 
experience in information retrieval, and then present my views on the present and future. 
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An automatic document retrieval system, programmed for the IBM 7094, is described. The 
system is designed to process English texts and search requests, and uses statistical, 
syntactic and semantic procedures for the analysis of information and the identification of 
relevant items. The operations are planned around a central supervisor, which in turn calls 
on the various subroutines, as desired. This organization makes it possible to alter both 
the processing sequences and the mat ... 
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We present results of two methods for assessing the event profile of news articles as a 
function of verb type. The unique contribution of this research is the focus on the role of 
verbs, rather than nouns. Two algorithms are presented and evaluated, one of which is 
shown to accurately discriminate documents by type and semantic properties, i.e. the 
event profile. The initial method, using WordNet (Miller et al. 1990), produced multiple 
cross-classification of articles, primarily due to the bushy ... 
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In this paper, we propose a statistical approach for clustering of articles using on-line 
dictionary definitions. One of the characteristics of our approach is that every sense of 
word in articles is autonnatically disambiguated using dictionary definitions. The other is 
that in order to cope with the problem of a phrasal lexicon, linking which links words with 
their semantically similar words in articles is introduced in our method. The results of 
experiments demonstrate the effectiveness of the ... 
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The general objective of this research has been the enhancement of traditional key-word 
based statistical methods of document retrieval with advanced natural language 
processing techniques. In the work to date the focus has been on obtaining a better 
representation of document contents by extracting representative phrases from 
syntactically preprocessed text and devising suitable weighting schemes for different 
types of terms. In addition, statistical clustering methods have been developed that ... 



63 Automatic noun classification by using Japanese-English word pairs 
Naomi Inoue 

June 1991 Proceedings of the 29th annual meeting on Association for Computational 
Linguistics 

Publisher: Association for Computational Linguistics 

Full text available: 15| pdf(479.46 KB) 

^ Additional Information: full citation , abstract , references 

^ Publisher Site 

This paper describes a method of classifying semantically similar nouns. The approach is 
based on the "distributional hypothesis". Our approach is characterized by distinguishing 
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among senses of the sanne word in order to resolve the "polysemy" issue. The 
classification result demonstrates that our approach is successful. 
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A classified, or clustered file is one where related, or sinnilar records are grouped into 
classes, or clusters of itenns in such a way that all items within a cluster are jointly 
retrievable. Clustered files are easily adapted to broad and narrow search strategies, and 
simple file updating methods are available. An inexpensive file clustering method 
applicable to large files is given together with appropriate file search methods. An abstract 
model is then introduced to predict the retrieval ... 

Keywords: automatic classification, cluster searching, clustered files, fast classification, 
file organization, probabilistic models 
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CLU is a new programming language designed to support the use of abstractions in 
program construction. Work in programming methodology has led to the realization that 
three kinds of abstractions— procedural, control, and especially data abstractions— are 
useful in the programming process. Of these, only the procedural abstraction is supported 
well by conventional languages, through the procedure or subroutine. CLU provides, in 
addition to procedures, novel linguistic mechanisms th ... 

Keywords: control abstractions, data abstractions, data types, programming languages, 
programming methodology, separate compilation 
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The 117 manuscripts submitted for the Hypertext '91 conference were assigned to 
members of the review committee, using a variety of automated methods based on 
infornnation retrieval principles and Latent Semantic Indexing. Fifteen reviewers provided 
exhaustive ratings for the submitted abstracts, indicating how well each abstract matched 
their interests. The automated methods do a fairly good job of assigning relevant papers 
for review, but they are still somewhat poorer tha ... 
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Choosing an appropriate document representation and search strategy for document 
retrieval has been largely guided by achieving good average performance instead of 
optimizing the results for each individual query. A model of retrieval based on plausible 
inference gives us a different perspective and suggests that techniques should be found 
for combining multiple sources of evidence (or search strategies) into an overall 
assessment of a document's relevance, rather than attempting to pick a ... 
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Syntactic phrase indexing and ternn clustering have been widely explored as text 
representation techniques for text retrieval. In this paper we study the properties of 
phrasal and clustered indexing languages on a text categorization task, enabling us to 
study their properties in isolation from query interpretation issues. We show that optimal 
effectiveness occurs when using only a small proportion of the indexing terms available, 
and that effectiveness peaks at a higher feature set size and ... 
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A specific query establishes a rigid qualification and is concerned only with data that 
match it precisely. A vague query establishes a target qualification and is concerned also 
with data that are close to this target. Most conventional database systems cannot handle 
vague queries directly, forcing their users to retry specific queries repeatedly with minor 
modifications until they match data that are satisfactory. This article describes a system 
called VAGUE that can handle vague queries ... 
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This paper studies the problem of storing single-level and multilevel clustered files. 
Necessary and sufficient conditions for a single-level clustered file to have the consecutive 
retrieval property (CRP) are developed. A linear time algorithm to test the CRP for a given 
clustered file and to identify the proper arrangement of objects, if CRP exists, is 
presented. For the single-level clustered files that do not have CRP, it is shown that the 
problem of Identifying a storage organization w ... 
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Entity-relationship clustering promotes the simplicity that is vital for fast end-user 
comprehension, as well as the complexity at a more detailed level to satisfy the database 
designer's need for extended semantic expression in the conceptual model. 
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The current role of computers in automatic document processing is briefly outlined, and 
some reasons are given why the early promise of library automation and of the 
mechanization of documentation processes has not been fulfilled. A new dynamic 
document environment is then outlined in which clustered files are searched and 
information is retrieved following an interactive user-controlled search process. Methods 
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are described for an automatic query modification based on user needs ... 
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The underlying principle of the DR-LINK System is that retrieval must be at the conceptual 
level, not the word level. That is, a successful retrieval system must retrieve on the basis 
of what people mean in their query, not just what they say in their query. The same is 
true of documents - their representation needs to capture the content at the conceptual 
level of expression. To accomplish this human-like goal, DR-LINK aims to represent and 
match documents and queries at all of the avail ... 
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Many effective search strategies derived from different nnodels are available for document 
retrieval systems. However, it does not appear that there is a single most effective 
strategy. Instead, different strategies perform optimally under different conditions. This 
paper outlines the design of an adaptive document retrieval system that chooses the best 
search strategy for a particular situation and user. In order to be able to support a variety 
of search strategies, a general network representat ... 



91 S pecial issue on knowled g e representation 
Ronald J. Brachman, Brian C. Smith 

^ February 1980 ACM SIGART Bulletin, issue 70 

Publisher: ACM Press 

Full text available:^ pdf(13. 13 MB) Additional Information: full citation , abstract 

In the fall of 1978 we decided to produce a special issue of the SIGART Newsletter 
devoted to a survey of current l<nowledge representation research. We felt that there 
were twe useful functions such an issue could serve. First, we hoped to elicit a clear 
picture of how people working in this subdiscipline understand knowledge representation 
research, to illuminate the issues on which current research is focused, and to catalogue 
what approaches and techniques are currently being developed. Secon ... 
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User-oriented clustering schemes enable the classification of docunnents based upon the 
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user perception of the similarity between documents, rather than on some similarity 
function presumed by the designer to represent the user criteria. In this paper, an 
enhancement of such a clustering scheme is presented. This is accomplished by the 
formulation of the user-oriented clustering as a function-optimization problem. The 
problem formulated is termed the Boundary Selection Problem (BSP). Heurist ... 

93 Incremental clustering and dynamic information retrieval 
Moses Charikar, Chandra Chekuri, Tomas Feder, Rajeev Motwani 
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Bootstrapping semantics from text is one of the greatest challenges in natural language 
learning. We first define a word similarity measure based on the distributional pattern of 
words. The similarity measure allows us to construct a thesaurus using a parsed corpus. 
We then present a new evaluation methodology for the automatically constructed 
thesaurus. The evaluation results show that the thesaurus is significantly closer to 
WordNet than Roget Thesaurus is. 
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Many experts in mechanized text processing now agree that useful automatic language 
analysis procedures are largely unavailable and that the existing linguistic methodologies 
generally produce disappointing results. An attempt is made in the present study to 
identify those automatic procedures which appear most effective as a replacement for the 
missing language analysis. A series of computer experiments is described, designed to 
simulate a conventional document retrieval environ ... 
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on Research and development in information retrieval 
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^ terms 

Partitioning by clustering of very large databases is a necessity to reduce the space/time 
complexity of retrieval operations. However, the contemporary and modern retrieval 
environments demand dynamic maintenance of clusters. A new cluster maintenance 
strategy is proposed and its similarity/stability characteristics, cost analysis, and retrieval 
behavior in comparison with unclustered and completely reclustered database 
environments have been examined by means of a series of experi ... 

98 Information storage and retrieval: a survey and functional description | 
^ Jack Minker 

September 1977 ACM SIGIR Forum, volume 12 issue 2 

Publisher: ACM Press 

Full text available: ^pdf (5.14 MB) Additional Information: full citation , abstract , references 

Information Storage and Retrieval (IS&R) encompasses a broad scope of topics ranging 
from basic techniques for accessing data to sophisticated approaches for the analysis of 
natural language text and the deduction of information. Within the field, three general 
areas of investigation can be distinguished not only by their subject matter but also by the 
types of individuals presently interested in them:(l) Document retrieval, (2) Generalized 
data management, and(3) Question-answering.A functional ... 

Keywords: automatic indexing, data management, data structures, deductive search, 
information retrieval, natural language, problem solving, question-answering, relational 
data systems, theorem proving 
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We propose a new navigation paradigm based on a spatial metaphor to help users access 
and navigate within large sets of documents. This metaphor is implemented by a 
computer artifact called an Interactive Dynamic Map (IDM). An IDM plays a role similar to 
the role of a real map with respect to physical space. Two types of IDMs are computed 
from the documents: Topic IDMs represent the semantic contents of a set of documents 
while Document IDMs visualize a subset of documents ... 

Keywords: information retrieval. Interaction paradigm, maps, navigation, visualization 
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Our work focuses on identifying various types of lexical data in large corpora through 
statistical analysis. In this paper, we present a method for grouping adjectives according 
to their meaning, as a step towards the automatic identification of adjectival scales. We 
describe how our system exploits two sources of linguistic knowledge in a corpus to 
compute a measure of similarity between two adjectives, using statistical techniques and 
a clustering algorithm for grouping. We evaluate the signif ... 

105 Protofoil: storin g and findin g the information worker's p a per documents in an 
electronic file cabinet 

Ramana Rao, Stuart K. Card, Walter Johnson, Leigh Klotz, Randall H. Trigg 
April 1994 Proceedings of the SIGCHI conference on Human factors in computing 

systems: celebrating interdependence 
Publisher: ACM Press 

Full text available: ^ pdf(1.38 MB) Additional Information: full citation , references , citings , index terms 




Keywords: ad hoc information work, document imaging, filing of paper documents, 
information retrieval, paper user interface 



''OB Description of EDCS technolo g y clusters 

September 1997 ACM SIGSOFT Software Engineering Notes, volume 22 issue 5 
Publisher: ACM Press 

Full text available: ^ pdf(1.14 MB) Additional Information: full citation , abstract , citings , index terms 

Evolutionary Systems are those that are capable of accomodating change over an 
extended system lifetime with reduced risk and cost/schedule impact. Most of our 
complex defense systems depend on software for their successful operation and, as a 
result, the software in those systems is the prmary vehicle for adapting to change. The 
EDCS (Evolutionary Design of Complex Software) Program is providing for the 
development and experimental application of new software technologies which can enable 
signi ... 
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Several graph theoretic cluster techniques aimed at the automatic generation of thesauri 
for information retrieval systems are explored. Experimental cluster analysis is performed 
on a sample corpus of 2267 documents. A term-term similarity matrix is constructed for 
the 3950 unique terms used to index the documents. Various threshold values, T, are 
applied to the similarity matrix to provide a series of binary threshold matrices. The 
corresponding graph of each binary thres ... 
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One of the main component of integrated office systems is the large central filing system. 
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It efficiently stores, retrieves and searches office documents containing text, images, 
graphics, data and voice. We propose to implement a filing system on top of the 
Darmstadt database system (DASDBS), which is designed as a data management kernel 
for both standard and non-standard applications. This paper investigates the choice of 
appropriate storage structures for the filing system objects and th ... 
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In spite of the increasing sophistication and power of commercial spreadsheet packages, 
we still lack a formal theory or a methodology to support the construction and 
maintenance of spreadsheet models. Using a dual logical/physical perspective, we identify 
four principal components that characterize any spread sheet model: schema, data, 
editorial, and binding. We present a factoring algorithm for identifying and extracting 
these components ... 
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A very promising idea for fast searching in traditional and multimedia databases is to map 
objects into points in k-d space, using k feature-extraction functions, provided by a 
domain expert [25]. Thus, we can subsequently use highly fine-tuned spatial access 
methods (SAMs), to answer several types of queries, including the 'Query By Example' 
type (which translates to a range query); the 'all pairs' query (which translates to a 
spatial join [8]); the nearest-neighbor or best-match ... 
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AIR represents a connectionist approach to the task of information retrieval. The system 
uses relevance feedback from its users to change Its representation of authors. Index 
terms and documents so that, over time, AIR improves at its task. The result is a 
representation of the consensual meaning of keywords and documents shared by some 
group of users. The central focus goal of this paper is to use our experience with AIR to 
highlight those characteristics of connectionist ... 
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We propose a new method of classifying documents into categories. We define for each 
category a finite mixture model based on soft clustering of words. We treat the problem of 
classifying documents as that of conducting statistical hypothesis testing over finite 
mixture models, and employ the EM algorithm to efficiently estimate parameters in a 
finite mixture model. Experimental results indicate that our method outperforms existing 
methods. 
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This paper addresses the problem of how to identify the intended meaning of individual 
words in unrestricted texts, without necessarily having access to complete 
representations of sentences. To discriminate senses, an understander can consider a 
diversity of information, including syntactic tags, word frequencies, collocations, semantic 
context, role-related expectations, and syntactic restrictions. However, current 
approaches make use of only small subsets of this information. Here we will des ... 
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HNC Software, Inc. has developed a system called DOCUVERSE for visualizing the 
information content of large textual corpora. The system Is built around two separate 
neural network methodologies: context vectors and self organizing maps. Context vectors 
(CVs) are high dimensional information representations that encode the semantic content 
of the textual entities they represent. Self organizing maps (SOMs) are capable of 
transforming an input, high dimensional signal space into a much lower (usua ... 
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This paper describes our experience implementing CES, a distributed Collaborative Editing 
System written in Argus, a language that includes facilities for managing long-lived 
distributed data. Argus provides <i>atomic actions,</i> which simplify the handling of 
concurrency and failures, and mechanisms for implementing <i>atomic data types, </i> 
which ensure serializability and recoverability of actions that use them. This paper focuses 
on the support for atomicity in Argus ... 
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determination of positive as well as negative relationships between terms is evaluated. 
The term relationships are incorporated Into the retrieval process by using a generalized 
similarity function that has a term match component, a positive term relationship 
component, and a negative term relationship component. Two strategies, query 
partitioning and query clustering, for the evaluation of the effectiv ... 
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This article presents a customizable architecture for software agents that capture and 
access information in large, heterogeneous, distributed electronic repositories. The key 
idea is to exploit underlying structure at various levels of granularity to build high-level 
indices with task-specific interpretations. Information agents construct such indices and 
are configured as a network of reusable modules called structure detectors and 
segmenters. We illustrate our architectu ... 
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Query expansion methods have been studied for a long time - with debatable success in 
many instances. In this paper we present a probabilistic query expansion model based on 
a similarity thesaurus which was constructed automatically. A similarity thesaurus reflects 
domain knowledge about the particular collection from which it is constructed. We address 
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Decomposing complex software systems into conceptually independent subsystems 
represents a significant software engineering activity that receives considerable research 
attention. Most of the research in this domain deals with the source code; trying to cluster 
together files which are conceptually related. In this paper we propose using a more 
informal source of information: file names. We present an experiment which shows that 
file naming convention is the best file clustering criteria for the ... 
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Lexical ambiguity is a pervasive problem in natural language processing. However, little 
quantitative information is available about the extent of the problem or about the Impact 
that it has on information retrieval systems. We report on an analysis of lexical ambiguity 
in information retrieval test collections and on experiments to determine the utility of 
word meanings for separating relevant from nonrelevant documents. The experiments 
show that there is considerable ambiguity even in a s ... 
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One problem in computer program testing arises when errors are found and corrected 
after a portion of the tests have run properly. How can it be shown that a fix to one area 
of the code does not adversely affect the execution of another area? What is needed is a 
quantitative method for assuring that new program modifications do not introduce new 
errors into the code. This model considers the retest philosophy that every program 
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A well constructed thesaurus has long been recognized as a valuable tool in the effective 
operation of an information retrieval system. This paper reports the results of 
experiments designed to determine the validity of an approach to the automatic 
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storage libraries for cost reasons. This paper develops an integrated approach to the 
. vertical data migration between the tertiary, secondary, and primary storage in that it 
reconciles speculative prefetching, to mask the high latency of the tertiary storage, with 
the replacement policy of the document caches at the secondary and primary storage 
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It is widely accepted that tagging text with semantic information would improve the 
quality of lexical learning in corpus-based NLP methods. However available on-line 
taxonomies are rather entangled and introduce an unnecessary level of ambiguity. The 
noise produced by the redundant number of tags often overrides the advantage of 
semantic tagging. In this paper we propose an automatic method to select from WordNet 
a subset of domain-appropriate categories that effectively reduce the overambiguit ... 
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Most common database management systems represent information In a simple record- 
based format. Semantic modeling provides richer data structuring capabilities for 
database applications. In particular, research in this area has articulated a number of 
. constructs that provide mechanisms for representing structurally complex interrelations 
among data typically arising in commercial applications. In general terms, semantic 
modeling complements work on knowledge representation (in artificial int ... 
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Flatland is an augmented whiteboard interface designed for informal office work. Our 
research investigates approaches to building an augmented whiteboard in the context of 
continuous, long term office use. In particular, we pursued three avenues of research 
based on input from user studies: techniques for the management of space on the board, 
the ability to flexibly apply behaviors to support varied application semantics, and 
mechanisms for managing history on the board. Unlike some p ... 
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The vast amount of textual information available today is useless unless it can be 
effectively and efficiently searched. The goal in information retrieval Is to find documents 
that are relevant to a given user query. We can represent and document collection by a 
matrix whose (i, j) entry is nonzero only if the ith term appears in the jth document; thus 
each document corresponds to a columm vector. The query is also represented as a 
column v ... 
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The object-oriented paradigm is becoming very popular for database applications and 
several object-oriented DBMSs have been developed. A basic notion in this paradigm is 
the inheritance hierarchy that allows the users to define objects and the associated 
operations starting from already defined objects. However, in database applications the 
inheritance hierarchy- must provide a conceptual modeling function, in addition to the re- 
usability function. Another important requirement is to provide ... 
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This article surveys techniques used in structured and object-oriented software 
specification methods. The techniques are classified as techniques for the specification of 
external interaction and internal decomposition. The external specification techniques are 
further subdivided into techniques for the specification of functions,. behavior, and 
communication. After surveying the techniques, we summarize the way they are used in 
structured and object-oriented methods and indicate ways in w. ... 
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We describe the early stage of our nriethodology of knowledge acquisition fronn technical 
texts. First, a partial nnorpho-syntactic analysis is performed to extract "candidate terms". 
Then, the knowledge engineer, assisted by an automatic clustering tool, builds the 
"conceptual fields" of the domain. We focus on this conceptual analysis stage, describe the 
data prepared from the results of the morpho-syntactic analysis and show the results of 
the clustering module and their interpretation. We found ... 
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In information-filtering environments, uncertainties associated with changing interests of 
the user and the dynamic document stream must be handled efficiently. In this article, a 
filtering model is proposed that decomposes the overall task into subsystem functionalities 
and highlights the need for multiple adaptation techniques to cope with uncertainties. A 
filtering system, SIFTER, has been implemented based on the model, using established 
techniques in information retrieval and artificia ... 
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A connponent theory of information retrieval using single content terms as component for 
queries and documents was reviewed and experimented with. The theory has the 
advantages of being able to (1) bootstrap itself, that is, define initial term weights 
naturally based on the fact that items are self relevent; (2) make use of within-item term 
frequencies; (3) account for query-focused and document-focused indexing and retrieval 
strategies cooperatively; and (4) allow for component-specific fe ... 
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Persistent Application Systems (PASs) are of increasing social and economic importance. 
They have the potential to be long-lived, concurrently accessed, and consist of large 
bodies of data and programs. Typical examples of PASs are CAD/CAM systems, office 
automation, CASE tools, software engineering environments, and patient-care support 
systems in hospitals. Orthogonally persistent object systems are intended to provide 
Improved support for the design, construction, maintenance, and operation 0 ... 
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We present a methodology for summarization of news about current events in the form of 
briefings that include appropriate background (historical) information. The system that we 
developed, SUMMONS, uses the output of systems developed for the DARPA Message 
Understanding Conferences to generate summaries of multiple documents on the same or 
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We describe a method for automatic word sense disambiguation using a text corpus and a 
machine-readble dictionary (MRD). The method is based on word similarity and context 
similarity measures. Words are considered similar if they appear in similar contexts; 
contexts are similar if they contain similar words. The circularity of this definition is 
resolved by an iterative, converging process, in whicH the system learns from the corpus a 
set of typical usages for each of the senses of the polysemou ... 
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The paper proposes methods and tools for building reusable components from families of 
Information System conceptual schemas,. based on the identification of similar 
components in different schemas, and on their engineering into normalized descriptions. 
Clustering and abstraction techniques to help identifying similar. components, and 
techniques to build corresponding reusable components are described in the paper. 
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In this paper we outline a research program for connputational linguistics, making 
extensive use of text corpora. We demonstrate how a semantic framework for lexical 
knowledge can suggest richer relationships among words in text beyond that of simple co- 
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This article surveys probablistic approaches to modeling information retrieval. The basic 
concepts of probabilistic approaches to information retrieval are outlined and the 
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Research papers available on the World Wide Web (WWW or Web) are often poorly 
organized, often exist in forms opaque to search engines (e.g. Postscript), and increase in 
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The Portable Al Lab (PAIL) is an integrated collection of modules implementing 
established Al tools and techniques Intended for use as a resource for teaching or learning 
artificial intelligence (Al). to well illustrated Al concepts by providing a set of incrementally 
complex demonstration programs, interactive tools to develop new examples, on-line 
context sensitive documentation with the description of the techniques involved, 
bibliographic references, and a set of exercises and projects. PAII — 
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In this paper, we describe a program, of research designed to explore how a lexical 
semantic theory may be exploited for extracting information from corpora suitable for use 
in Information Retrieval applications. Unlike with purely statistical collocational analyses, 
the framework of a semantic theory allows the automatic construction of predictions 
about semantic relationships among words appearing in collocational systems. We 
illustrate the approach for the acquisition of lexical informa ... 
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Natural language query formulations exhibit advantages over artificial language 
statements since they permit the user to approach the retrieval environment without prior 
training and without using intermediaries. To obtain adequate retrieval output, it is 
however necessary to emphasize the good terms and to deemphasize the bad ones. The 
usefulness of the terms in a natural language vocabulary is first characterized in terms of 
their frequency distribution over the documents of a collection. The ... 
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We developed a prototype information retrieval system which uses advanced natural 
language processing techniques to enhance the effectiveness of traditional key-word 
based document retrieval. The backbone of our system is a statistical retrieval engine 
which performs automated indexing of documents, then search and ranking in response 
to user queries. This core architecture is augmented with advanced natural language 
processing tools which are both robust and efficient. In early experiments, the ... 
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This paper presents an overview of the Cedar programming environment, focusing on its 
overall structure— that is, the major components of Cedar and the way they are 
organized. Cedar supports the development of programs written in a single programming 
language, also called Cedar. Its primary purpose is to increase the productivity of 
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The development of the theory of library classification and of subject indexing, for the 
organisation, storage and retrieval of subjects embodied in documents has a striking 
parallelism to the search for 'universal forms' and deep structure' in language and 
linguistic studies. The significant contributions of the theories of classification and subject 
indexing are the subject analysis techniques of Ranganathan and Bhattacharyya's POPSI. 
A computer based system, for generating an information retr ... 
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We describe work on the visualization of bibliographic data and, to aid in this task, the 
application of numerical techniques for multidimensional scaling. Many areas of scientific 
research involve complex multivariate data. One example of this is Information Retrieval. 
Document comparisons may be done using a large number of variables. Such conditions 
' do not favour the more well-known methods of visualization and graphical analysis, as it 
is rarely feasible to map each variable ... 
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